My knowledge about CNNs is relatively basic. I know what they do, but I have not enough experience to understand the different choices in architectures (besides the obvious improvements, like a residual block). For what I do I never needed to rely on CNNs, so whenever I worked with them I just used some existing CNN structures, but now I would like to learn more about it, so that I can optimize my current models.
So for example looking at some CNNs used in reinforcement learning, here's one in particular (the CNN from the Atari self-learning Nature article):
self.base = nn.Sequential( init_(nn.Conv2d(4, 32, kernel_size=8, stride=4, padding=0)), nn.ReLU(), init_(nn.Conv2d(32, 64, kernel_size=4, stride=2, padding=0)), nn.ReLU(), init_(nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=0)), nn.ReLU(), Flatten(), init_(nn.Linear(32 * 7 * 7, outputs)), nn.ReLU() )
The input is a stack of 4 frames with 84x84 grayscale pixels. I understand that you'd want to divide this image into many different smaller fields, but what's the intuition of doing it in such a way, e.g. 3 layers of CNNs? Why not 5? Why not 2 or 1, but with more outputs? In fact, it seems to me that a parallel approach with different parameters, instead of a sequential approach would be superior, since it seems to me that information would get lost after the first layer that uses a kernel size of 8.
Are there any resources that you could recommend and that delve deeply into the nuances when creating CNN architectures?
Thank you
submitted by /u/NikEy
[link] [comments]
( 7
min )